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Detailed protocol 


SNP and genotype calling 


# The initial input files for this protocol are the outputs from RealignerTargetCreator and IndelRealigner in Genome Analysis Toolkit (GATK), generating for 
each sample a realigned BAM file. 


Step 1: For each sample, use HaplotypeCaller in GATK to generate per-sample genomic variants call formats (GVCFs). The command is as follows: 
gatk --java-options "-Xmx4g" HaplotypeCaller -R REFERENCE fasta -l input.bam -O output.g.vcf.gz -ERC GVCF 


Step 2: Then the GVCFs are passed to GenotypeGVCFs in GATK, generating a raw VCF file in which all samples have been jointly genotyped. The command 
is as follows: 


gatk --java-options "-Xmx4g" GenotypeGVCFs -R REFERENCE fasta -V input1.g.vcf.gz -V input2.g.vcf.gz ... -O raw.vcf.gz 

Note: The GATK4 GenotypeGVCFs tool can take only one input track. Options are 1) a single single-sample GVCF 2) a single multi-sample GVCF created by 
CombineGVCFs or 3) a GenomicsDB workspace created by GenomicsDBImport. A sample-level GVCF is produced by HaplotypeCaller with the --ERC 
GVCF' setting. 


Step 3: Filter SNPs using VCFTOOLS 
veftools --gzvcf raw.vcf.gz --maf 0.05 --min-meanDP 4 --max-meanDP 50 --max-missing 0.7--remove-filtered-all -- --minGQ 10 --min-alleles 2 --max-alleles 2 -- 
remove-indels --recode -c | gzip -c > filterSNP.vcf.gz 


Related files 


SNP and genotype calling.docx eo) 
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